Terminology extraction: an analysis of linguistic and statistical approaches

نویسندگان

  • Maria Teresa Pazienza
  • Marco Pennacchiotti
  • Fabio Massimo Zanzotto
چکیده

Are linguistic properties and behaviors important to recognize terms? Are statistical measures effective to extract terms? Is it possible to capture a sort of termhood with computation linguistic techniques? Or maybe, terms are too much sensitive to exogenous and pragmatic factors that cannot be confined in computational linguistic? All these questions are still open. This study tries to contribute in the search of an answer, with the belief that it can be found only through a careful experimental analysis of real case studies and a study of their correlation with theoretical insights.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Linguistic and Statistical Approaches to Basque Term Extraction

The development of applications for terminology extraction in Basque demands previous research on linguistic techniques, in order to fulfil the requirements of Basque language processing. Being Basque an agglutinative language, the results of pure statistical methods are not satisfactory and suitable for term extraction. In this work, we have adopted a hybrid approach, based on the selection of...

متن کامل

EχATOLP – An Automatic Tool for Term Extraction from Portuguese Language Corpora

This paper describes EχATOLP, a software tool to extract significant terms from an annotated corpus written in portuguese about a specific domain of interest. Being based on linguistic and statistical approaches, this tool extracts terms that are frequent and syntactic relevant to the domain of interest.

متن کامل

Combined approach for terminology extraction: lexical statistics and linguistic ltering

This paper describes the automatic extraction of the terminology of a speci c domain from a large corpus. The use of statistical methods yields a number of solutions, but these produce a considerable amount of noise. The task we have concentrated on is the creation and testing of an original method to reduce high noise rates by combining linguistic data and statistical methods. Starting from a ...

متن کامل

Multi-word Term Extraction for Bulgarian

The goal of this paper is to compile a method for multi-word term extraction, taking into account both the linguistic properties of Bulgarian terms and their statistical rates. The method relies on the extraction of term candidates matching given syntactic patterns followed by statistical (by means of Log-likelihood ratio) and linguistically (by means of inflectional clustering) based filtering...

متن کامل

A Multi-Word Term Extraction Program for Arabic Language

Terminology extraction commonly includes two steps: identification of term-like units in the texts, mostly multi-word phrases, and the ranking of the extracted term-like units according to their domain representativity. In this paper, we design a multi-word term extraction program for Arabic language. The linguistic filtering performs a morphosyntactic analysis and takes into account several ty...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004